auto_annot_Haber2017_with_Smillie2019_dblabel_level3

Specify folders where .h5ad files are found and their names.

The datasets that are already annotated and should be used for training. If you only use one dataset please use list of one.

The dataset of interest that should be annotated.

Define level of dblabel reference annotation

Give your analysis a name.

Now specify parameters

Specify column name of celltype annotation you want to train on.

Choose a method:

Specify merge method if using multiple training datasets. Needs to be either scanorama or naive.

Decide if you want to use the raw format or highly variable genes. Raw increases computational time and does not necessarily improve predictions.

You can choose to only consider a subset of genes from a signature set.

Translate cell type annoation to lower dblabel level

Prepare all training and the testing set.

This function merges training datasets, removes unwanted genes, and if scanorama is used corrects for datasets.

Train the classifier.

The returned scaler is fitted on the training dataset (to zero mean and scaled to unit variance).

Prediction

Use fitted model to predict celltypes in adata_pred. Prediction will be added in a new column called 'auto_annot'. Paths are needed as adata_pred will revert to its original state (all genes, no additional corrections). The threshold should be set to 0 or left out for SVM. For logisitic regression the threshold can be set.

Write out metrics to a report file, create confusion matrices and comparative umap plots

Convert to html